A Bioinformatics Approach to MicroRNA-Sequencing Analysis Based on Human Saliva Samples of Patients with Endometriosis

Endometriosis, defined by the presence of endometrium-like tissue outside the uterus, affects 2–10% of the female population, i.e., around 190 million women, worldwide. The aim of the prospective ENDO-miRNA study was to develop a bioinformatics approach for microRNA-sequencing analysis of 200 saliva samples for miRNAome expression and to test its diagnostic accuracy for endometriosis. Among the 200 patients, 76.5% (n = 153) had confirmed endometriosis and 23.5% (n = 47) had no endometriosis (controls). Small RNA-seq of 200 saliva samples yielded ~4642 M raw sequencing reads (from ~13.7 M to ~39.3 M reads/sample). The number of expressed miRNAs ranged from 1250 (outlier) to 2561 per sample. Some 2561 miRNAs were found to be differentially expressed in the saliva samples of patients with endometriosis compared with the control patients. Among these, 1.17% (n = 30) were up- or downregulated. Among these, the F1-score, sensitivity, specificity, and AUC ranged from 11–86.8%, 5.8–97.4%, 10.6–100%, and 39.3–69.2%, respectively. Here, we report a bioinformatic approach to saliva miRNA sequencing and analysis. We underline the advantages of using saliva over blood in terms of ease of collection, reproducibility, stability, safety, non-invasiveness. This report describes the whole saliva transcriptome to make miRNA quantification a validated, standardized, and reliable technique for routine use. The methodology could be applied to build a saliva signature of endometriosis.


Introduction
MicroRNAs (miRNAs) are small, highly conserved non-coding RNAs with a length of about 22 nucleotides which bind to the 3 -untranslated region (3 -UTR) of target messenger RNAs (mRNAs), thus regulating gene expression post-transcriptionally through RNA degradation and/or translational inhibition [1,2]. Schematically, miRNA biosynthesis involves several steps: (i) they are first transcribed from genes in intronic regions of coding or non-coding transcripts, or coded from exons under the action of the RNA polymerase II, generating hundreds of duplex nucleotide-long primary miRNAs (pri-miRNA); (ii) the 2 of 11 pri-miRNA is subsequently cleaved by a complex formed by an RNase III enzyme, Drosha, RNA binding cofactor and Pasha to generate precursor miRNA (pre-miRNA); and (iii) then, the pre-miRNAs are transported from the nucleus to the cytoplasm using exportin 5 where the duplex is cleaved by Dicer and helicase to form mature miRNAs [2,3]. The miRNAs are subsequently incorporated into an RNA silencing complex (RISC) that regulates posttranslational modifications through binding to the 3 untranslated region (3 UTR) of the target messenger-RNA (mRNA). Finally, the miRNAs are released from the cells into circulation using various carriers such as Argonaute, nucleophosmin 1, high-density lipoproteins or extracellular vesicles (exosomes) with a distribution in human fluids where they can be detected [4,5].
Therefore, the goal of the prospective ENDO-miRNA study was to develop a bioinformatics approach for microRNA-sequencing analysis of 200 saliva samples for miRNAome expression and to test its diagnostic accuracy for endometriosis.

Study Population
We used data from the prospective "ENDOmiARN" study (ClinicalTrials.gov Identifier: NCT04728152). Data collection and analysis were carried out under Research Protocol n • ID RCB: 2020-A03297-32. We obtained signed informed consent from all participants in the study. The experimental protocol was approved by Ethics committee le comité de protection des personnes (C.P.P.) Sud-Ouest et Outre-Mer 1 (CPP 1-20-095 ID 10476).
The ENDOmiARN study included 200 saliva samples obtained from patients with chronic pelvic pain suggestive of endometriosis. All the samples were collected between January 2021 and June 2021. Analysis was performed blinded to the surgical and imaging findings. The patients with endometriosis were stratified according to the revised American Society of Reproductive Medicine (rASRM) classification [17]. The main characteristics of the patients included in the ENDOmiARN study are displayed in Table 1.

Saliva Sample Collection
The saliva samples (2 mL) were collected in an all-in-one system including a nucleic acid stabilizing solution for collection, stabilization and transportation (OME 505, DNA Genotek Inc., 2 Beaverbrook Road Ottawa, ON, Canada K2K 1L1) using an at-home kit (https://www. dnagenotek.com/row/products/collection-microbiome/omnigene-oral/OME-505.html, accessed on 1 January 2021). Subjects were asked to refrain from eating, drinking, smoking, or chewing gum for 30 min before the saliva sample was taken. All the samples were stored at room temperature prior to shipping.

RNA Sample Extraction, Preparation and Quality Control
RNA was isolated from each saliva sample using the miRNeasy Kit (Qiagen, Inc., Germantown, MD, USA) according to the manufacturer's instructions [6,8,9]. In accordance with DNA Genotek process of extraction, a systematic centrifugation was performed at 13,300× g for 3 min. RNA quality was assessed using the Agilent Technologies TapeStation 2200. RNA-sequencing libraries were prepared using the QIAseq miRNA Library Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Samples were indexed in batches of 96, with a targeted sequencing depth of 17 million reads per sample. Sequencing was performed using 100 base single-end reads, using an Novaseq6000 sequencer (Illumina, San Diego, CA, USA) [14,18]. The process used is the one summarized in the previously published work by Potla et al. [13].

Differential Expression Analysis of miRNA
miRNA expression was quantified by miRDeep2 v0.1.0 [22]. Differential expression tests were then conducted in DESeq2 for miRNAs with read counts in ≥1 of the samples. DESeq2 V1.20 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change [23]. The resulting matrix was filtered for expressed miRNAs and normalized using Z-score normalization [24]. miRNAs were considered as differen-tially expressed if the absolute value of log2-fold change was >1.5 (upregulated) and <0.5 (downregulated). The p value adjusted for multiple testing was <0.05 [23].

miRNA Diagnostic Accuracy
To evaluate the diagnostic accuracy of each miRNA biomarker, sensitivity, specificity, an ROC analysis was performed, and the ROC AUC was calculated [25,26].
Additional statistical analysis was based on the Chi 2 test as appropriate for categorical variables. Values of p < 0.05 were considered to denote significant differences. Data were managed with an Excel database (Microsoft, Redmond, WA, USA) and analyzed using R 2.15 software, available online (http://cran.r-project.org/, accessed on 1 January 2021).

Description of the ENDOmiARN Cohort
Among the 200 patients, 76.5% (n = 153) had confirmed endometriosis and 23.5% (n = 47) had no endometriosis (controls). In the endometriosis group, 52% (80) of the patients had rASRM stages I-II and 48% (73) had stage III-IV. The control group consisted of various benign pathologies with 51% (24) of the women having no abnormality. These were defined as "discordant" (or complex) patients corresponding to women with symptoms suggestive of endometriosis without clinical or MRI features of endometriosis and no endometriosis lesions discovered during laparoscopic inspection (Table 1).

Figure 2.
Overall composition of processed reads for saliva sample RNA reads = miRNAs + piRNAs + rRNAs + tRNAs + mRNAs + other; filtered reads = reads with no adapters + reads with low quality bases + reads too short; not characterized/mappable reads = mapped reads to GRCh38 that could not be characterized as a particular type; not characterized/not mappable reads = reads that could not be mapped.

miRNA Expression in Patients with and without Endometriosis
Of the miRNAs, 2561 were found to be differentially expressed in the saliva samples of patients with endometriosis, compared with the control patients. Among these, 1.17% (n = 30) were up-or downregulated. Figure 3 shows a volcano plot of the miRNAs expressed in endometriosis. Among the 30 regulated miRNAs, only three (hsa-miR-34c-5p, hsa-miR-4677-3p, hsa-miR-655-5p) had an AUC > 0.6. The top 10 differentially expressed miRNA patterns in the endometriosis and control are reported in Figure 4.

Figure 2.
Overall composition of processed reads for saliva sample RNA reads = miRNAs + piRNAs + rRNAs + tRNAs + mRNAs + other; filtered reads = reads with no adapters + reads with low quality bases + reads too short; not characterized/mappable reads = mapped reads to GRCh38 that could not be characterized as a particular type; not characterized/not mappable reads = reads that could not be mapped.

miRNA Expression in Patients with and without Endometriosis
Of the miRNAs, 2561 were found to be differentially expressed in the saliva samples of patients with endometriosis, compared with the control patients. Among these, 1.17% (n = 30) were up-or downregulated. Figure 3 shows a volcano plot of the miRNAs expressed in endometriosis. Among the 30 regulated miRNAs, only three (hsa-miR-34c-5p, hsa-miR-4677-3p, hsa-miR-655-5p) had an AUC > 0.6. The top 10 differentially expressed miRNA patterns in the endometriosis and control are reported in Figure 4.

Discussion
To the best of our knowledge, this is the first report detailing the miRNAome of 200 saliva samples from patients with and without endometriosis included in a prospective study: the ENDOmiARN study [3,11,16,27]. In addition, we report a bioinformatics approach to saliva miRNA sequencing and analysis and underline the advantages of using saliva over blood in terms of ease of collection, reproducibility, stability, safety, non-invasiveness, and cost-effectiveness [6,8,10,22,[28][29][30][31].
Preliminary results about the use of saliva RNAs as diagnostic biomarkers have previously been reported, mainly for cancer [6,8,32], systemic disease, and forensic casework [6,9,28,33]. However, the quality of the methodology and yield issues of these studies overall are debatable [34]. In a recent literature review of miRNAs for the non-invasive diagnosis of endometriosis, Monnaka et al. underlined that none of the 449 reports investigated miRNAs in saliva [11]. Therefore, since (i) no scientifically proven salivary biomarkers for endometriosis have been reported, and (ii) the applicability of such biomarkers has been poorly explored, the concept of extracting and identifying miRNAs from saliva samples for the reliable identification of endometriosis is challenging [35,36]. The main obstacle to using miRNAs is their stability and susceptibility to degradation. This has always been an issue for mRNA-based gene expression analysis and a potential source of bias for reproducibility [28,34,37]. This point was highlighted, for example, for forensic routine applications using miRNA, because biological stains from forensic casework are often altered by ambient moisture and temperature, UV light, suboptimal environmental pH, which all have the potential to degrade the miRNA beyond usability [28]. In this setting, Patel et al. demonstrated that Oragene•RNA solution could preserve and stabilize RNA collected from saliva to produce high yields of good quality RNA for subsequent downstream applications and/or analyses [34]. The authors reported that the RNA yield remained fairly constant between matched samples from each donor when stored for 48 h at room temperature [34]. In addition, they explored the differences in the total RNA yield from donors over a 3-day period, but also sought to examine the potential

Discussion
To the best of our knowledge, this is the first report detailing the miRNAome of 200 saliva samples from patients with and without endometriosis included in a prospective study: the ENDOmiARN study [3,11,16,27]. In addition, we report a bioinformatics approach to saliva miRNA sequencing and analysis and underline the advantages of using saliva over blood in terms of ease of collection, reproducibility, stability, safety, non-invasiveness, and cost-effectiveness [6,8,10,22,[28][29][30][31].
Preliminary results about the use of saliva RNAs as diagnostic biomarkers have previously been reported, mainly for cancer [6,8,32], systemic disease, and forensic casework [6,9,28,33]. However, the quality of the methodology and yield issues of these studies overall are debatable [34]. In a recent literature review of miRNAs for the non-invasive diagnosis of endometriosis, Monnaka et al. underlined that none of the 449 reports investigated miRNAs in saliva [11]. Therefore, since (i) no scientifically proven salivary biomarkers for endometriosis have been reported, and (ii) the applicability of such biomarkers has been poorly explored, the concept of extracting and identifying miRNAs from saliva samples for the reliable identification of endometriosis is challenging [35,36]. The main obstacle to using miRNAs is their stability and susceptibility to degradation. This has always been an issue for mRNA-based gene expression analysis and a potential source of bias for reproducibility [28,34,37]. This point was highlighted, for example, for forensic routine applications using miRNA, because biological stains from forensic casework are often altered by ambient moisture and temperature, UV light, suboptimal environmental pH, which all have the potential to degrade the miRNA beyond usability [28]. In this setting, Patel et al. demonstrated that Oragene•RNA solution could preserve and stabilize RNA collected from saliva to produce high yields of good quality RNA for subsequent downstream applications and/or analyses [34]. The authors reported that the RNA yield remained fairly constant between matched samples from each donor when stored for 48 h at room temperature [34]. In addition, they explored the differences in the total RNA yield from donors over a 3-day period, but also sought to examine the potential differences in expression between commonly used mRNA and miRNA endogenous controls [34]. Although the total RNA from each donor varied over the days, probably due to bacterial RNA, the abundance of the mammalian RNA normalizers (snU6 small RNA, 18S rRNA, GAPDH mRNA and let-7b miRNA) remained stable [34]. In the current study, 200 saliva samples were collected according to the manufacturer's guidelines (Oragene) and stored at room temperature prior to shipping and analysis. Interestingly, we found that the quantification of filtered reads and identification of miRNAs yielded~190 M sequences to be mapped to 2561 known miRNAs. The total reads ranged from 13 to 39 million with a mean of 23 million. Among these, the miRNA reads ranged from 272,322 to 6 million with a mean of 949,893 (Figure 2A,B). These results are in concordance with previous reports demonstrating that the salivary transcriptome is abundant and stable, consisting of thousands of mRNAs and miRNAs [6,9,10,28,34,38]. In this setting, Courts et al. also confirmed that miRNAs are especially relevant because they are stable and easy to collect and analyze, and validated their use in standard forensic medicine [28]. Using the Oragene•RNA kit, we demonstrated (i) the stability and consistency of miRNA reads for the 200 samples whatever the conditions of sampling and transport, (ii) the reproducibility and efficiency of such techniques since all the 200 samples were usable for sequencing, and (iii) a routine bioinformatics approach. In the present study, diagnostic accuracies according to the F1score, sensitivity, specificity and AUC ranged from 11-86.8%, 5.8-97.4%, 10.6-100%, and 39.3-69.2%, respectively. In addition, we identified 30 miRNAs up-and downregulated with a high heterogeneity in terms of accuracy.
Although the use of saliva for miRNA identification could be a potential non-invasive solution to overcome current barriers to the diagnosis of endometriosis, the critical step is the transition from expression data to candidate selection, which is always somewhat arbitrary. In this setting, salivary miRNAs have been reported to be of great interest as diagnostic biomarkers especially in cancer [6,10,32]. However, as there are no fixed rules about which criteria to apply to select a miRNA candidate, we developed a bioinformatics approach for miRNA accuracy: among the 2561 miRNAs identified, 30 were up-or downregulated, underpinning the use of new mathematical methods and artificial intelligence to overcome the limits of classic logistic regression. Indeed, in agreement with Lopez-Rincon et al., it is illusory to imagine that a few mi-RNAs could reflect the heterogeneity of a multifactorial disorder such as endometriosis, characterized by various phenotypes and for which the various pathways implicated in its genesis are poorly understood [7,15,39]. We thus used a new statistical tool, machine learning, to overcome the accuracy limitations and design a potential diagnostic signature [7,9,10,15,30].
In the present study, we analyzed 200 plasma samples for miRNA expression and diagnostic accuracy. However, there are several unsolved issues that might hinder the broad acceptance of a miRNA-based signature. The miRNAome, perhaps even more than the transcriptome, is highly context dependent, and it is conceivable that certain non-physiologic or pathologic conditions might alter the expression levels of miRNAs for body-fluid identification. It will therefore be necessary to test whether the expression of candidate miRNAs for body-fluid identification are influenced by biologic processes or conditions such as the menstrual cycle phase or previous hormonal treatment [12,40]. In this setting, Vanhie et al. and Moustafa al. reported no impact on miRNA expression according either to hormonal treatment or menstrual cycle phases in contrast to data obtained from endometrial biopsies [12,40]. This apparent discrepancy could be linked to the modalities of miRNA release into bodily fluids that could vary depending on the organ and the tumor. In the ENDOmiARN study, two different body fluids were assessed: serum and saliva. This choice was mostly driven by the need for stability in the miRNAs detected to provide a reliable diagnostic tool. Indeed, while several studies have observed differences in miRNA expression in tissues according to the menstrual phase, mainly at endometrial level [41,42], no such cyclic differences were observed in the plasma of healthy women [43]. One hypothesis is that changes in miRNA expression at the endometrium level regulate gene expression locally but are insufficient to cause detectable systemic changes [3]. The other reason to opt for saliva was its easy availability, including in a home self-sampling setting and including virgin patients not examined during gynecological appointments. Another issue is the variations of miRNA expression analysis according to the next-generation sequencing (NGS) technique used. Indeed, several different methods and devices for miRNA extraction, reverse transcription and quantification from NGS to microarray analysis have been advocated leading to differences in results [13,31,34]. However, as underlined by Agrawal et al., we believe that the standardized NGS procedure we describe here is optimal for endometriosis since it is currently the gold standard approach for profiling nucleic acid, including miRNAs [3]. In addition, miRNAs are just one of several classes of small, ncRNAs with regulatory functions, and there is no reason to exclude these RNAs from endometriosis analyses. Therefore, miRNA analysis may represent an interim strategy until more is known about other small RNAs, and once a comprehensive small-RNA analysis is available, it is likely to replace miRNA only analysis [44]. Eventually, our results require external validation supporting temporal and geographic validation of for mi RNA quantification and sequencing reproducibility; that is the goal of an ongoing study [45].

Conclusions
Endometriosis affects about 190 million women worldwide, representing a healthcare burden equivalent to diabetes [46]. Endometriosis is a representative example of a multifactorial and not completely understood, chronic disease. To understand the various signaling pathways involved in this complex disease, analysis of the entire miRNome currently available is mandatory. This report describes the whole saliva transcriptome to make miRNA quantification a validated, standardized, and reliable technique for routine use. The methodology could be applied to build a saliva signature of endometriosis and to solve other issues of this debilitating disorder-various clinical phenotypes, infertility-associated endometriosis-as well as to evaluate the potential theragnostic value of miRNA expression. Finally, beyond endometriosis, our methodology can be applied to other chronic diseases with the goal of developing a noninvasive, quick and reliable tool to improve diagnosis, management and to select patients according to therapeutic medical and/or surgical response.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The authors state that the data used are from the prospective ENDO-miARN study (ClinicalTrials.gov Identifier: NCT04728152). Data collection and analysis were carried out under Research Protocol n • ID RCB: 2020-A03297-32.
Acknowledgments: All authors would like to sincerely thank F. Neilson (matrixconsultants.fr) for her English revision of the manuscript.

Conflicts of Interest:
S. Suisse is a former employee of Ziwig, Inc. The remaining authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript.